Data transformation (I)
Create new variables, filter, select and organize data
An introduction to data management with dplyr
In this lab we are going to cover the very basics of data manipulation and transformation. We will leverage the fundamental functions for data manipulation from the package dplyr to start handling our datasets towards the desired analysis and/or visualization procedures.
Before we start, bear in mind that:
dplyr(nortidyrorpurrr) does not do anything that can’t be done with base R code,applyfamily functions,forloops or other packages.It is designed to be more efficient (in time consume and code lines), easier to read and easier to use. More intuitive to use, specially for beginners (it may require some adaptation if you are used to base R code).
It is Valid mostly for
data.frames. For other formats (matrices, arrays)plyrcan be used instead.
Our data
For the development of this lab we are going to use several data objects from the tibble class:
plots [11858 x 15]: all plots from the Third Spanish Forest Inventory (IFN3) in Cataloniatrees [111756 x 12]: all trees with dbh > 7.5 cm measured in both IFN2 and IFN3species [14778 x 15]: number of trees per hectare in each plot, by species and size classcoordinates [11858 x 6]: X and Y UTM coordinates of each plot.leaf [10447 x 3]: leaf biomass and carbon content for those IFN3 plots where they were available .
So, why tibbles?. Well, because tibbles are not usual data.frames, though they are a very similar data structures (class tbl_df) they facilitate data inspection and, thus, understanding by:
Printing only the first 10 rows by default (equal to
head(x, n =10)).Informing about variable types by default (
str(x)).
Besides that, it is like a data.frame (it is a data.frame)
Let’s have a look at the data
Data inspection must be the first thing we do before starting any kind of analysis. Understanding data is crucial to succeed. The glimpse function is perhaps the most informative way to know what our data looks like at a glance:
glimpse(trees)## Observations: 111,756
## Variables: 10
## $ Codi <fct> 080001, 080002, 080003, 080004, 080006, 080007, 080008, 0...
## $ Provincia <chr> "08", "08", "08", "08", "08", "08", "08", "08", "08", "08...
## $ Especie <fct> 022, 476, 021, 021, 021, 021, 243, 045, 243, 022, 021, 02...
## $ Rumbo <dbl> 7, 38, 25, 28, 19, 32, 40, 16, 47, 44, 13, 9, 9, 25, 199,...
## $ Dist <dbl> 8.3, 9.1, 7.0, 8.9, 11.2, 12.0, 7.8, 5.1, 26.9, 2.7, 8.9,...
## $ N <dbl> 31.8, 31.8, 31.8, 31.8, 14.1, 14.1, 31.8, 31.8, 5.1, 127....
## $ CD <dbl> 20, 35, 25, 15, 35, 35, 15, 20, 65, 15, 20, 30, 45, 20, 3...
## $ DiamIf3 <dbl> 20, 34, 25, 17, 34, 33, 15, 18, 67, 15, 19, 28, 46, 20, 3...
## $ DiamIf2 <dbl> 19, 32, 18, 13, 31, 28, 13, 15, 67, 13, 14, 25, 45, 20, 3...
## $ HeiIf3 <dbl> 9.0, 9.0, 11.0, 9.5, 13.0, 10.0, 6.0, 7.0, 16.5, 9.5, 11....
Nonetheless, we can just invoke the objects to check them:
trees## # A tibble: 111,756 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 080001 08 022 7 8.3 31.8 20 20.3 18.9 9
## 2 080002 08 476 38 9.1 31.8 35 34 32.4 9
## 3 080003 08 021 25 7 31.8 25 24.8 17.6 11
## 4 080004 08 021 28 8.89 31.8 15 16.8 12.6 9.5
## 5 080006 08 021 19 11.2 14.1 35 34.0 30.9 13
## 6 080007 08 021 32 12 14.1 35 33.1 28.2 10
## 7 080008 08 243 40 7.8 31.8 15 15 13.2 6
## 8 080009 08 045 16 5.09 31.8 20 17.5 15.3 7
## 9 080010 08 243 47 26.9 5.09 65 67.4 66.8 16.5
## 10 080013 08 022 44 2.7 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows
plots## # A tibble: 11,858 x 9
## Codi Provincia FccArb FechaIni HoraIni FechaFin
## <fct> <chr> <int> <date> <dttm> <date>
## 1 0800~ 08 70 2001-07-09 2017-11-26 09:44:00 2001-07-09
## 2 0800~ 08 70 2001-08-06 2017-11-26 09:18:58 2001-08-06
## 3 0800~ 08 80 2001-08-06 2017-11-26 12:08:09 2001-08-06
## 4 0800~ 08 50 2001-07-09 2017-11-26 13:23:23 2001-07-09
## 5 0800~ 08 60 2001-08-03 2017-11-26 09:11:28 2001-08-03
## 6 0800~ 08 90 2001-08-01 2017-11-26 13:00:33 2001-08-01
## 7 0800~ 08 90 2001-08-07 2017-11-26 10:08:15 2001-08-07
## 8 0800~ 08 60 2001-08-03 2017-11-26 12:12:03 2001-08-03
## 9 0800~ 08 70 2001-08-02 2017-11-26 09:00:16 2001-08-02
## 10 0800~ 08 80 2001-06-14 2017-11-26 12:34:21 2001-06-14
## # ... with 11,848 more rows, and 3 more variables: HoraFin <dttm>,
## # PhSuelo <int>, FechaPh <date>
species## # A tibble: 14,778 x 15
## # Groups: Codi, Especie [14,778]
## Codi Especie CD_10 CD_15 CD_20 CD_25 CD_30 CD_35 CD_40 CD_45 CD_50 CD_55
## * <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0800~ 022 0 159. 31.8 111. 60.1 19.2 5.09 0 0 0
## 2 0800~ 021 0 0 0 0 0 74.2 28.3 63.7 0 0
## 3 0800~ 022 0 0 0 173. 31.8 0 0 0 0 0
## 4 0800~ 476 0 0 0 0 0 31.8 0 0 0 0
## 5 0800~ 021 0 0 0 31.8 0 0 0 0 5.09 0
## 6 0800~ 022 0 127. 0 0 46.0 127. 0 14.1 0 0
## 7 0800~ 021 0 31.8 0 0 31.8 0 0 0 0 0
## 8 0800~ 071 0 0 0 14.1 5.09 31.8 0 0 0 0
## 9 0800~ 243 0 0 14.1 0 14.1 0 5.09 0 0 0
## 10 0800~ 278 0 286. 31.8 0 0 31.8 0 0 0 0
## # ... with 14,768 more rows, and 3 more variables: CD_60 <dbl>, CD_65 <dbl>,
## # CD_70 <dbl>
class(trees)## [1] "tbl_df" "tbl" "data.frame"
dplyr
dplyr is a essentially a grammar for data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
The 5 main verbs of dplyr
filter: keep/select the rows that match a given conditionselect: keep columns by namearrange: sort rows acording to a given variablemutate: transform existent variables or create new onessummarise: do some summary statistics and reduce data
These is a very small subset of what dplyr can do. Just a quick check into its cheatsheet reveals a wide range of possibilities:
Common structure
Most of the tidyverse verbs/functions share certain elements in common and dplyr is certainly not an exception. The basic sintax works like this:
verb(data, ...)- First argument: data (as
data.frameortbl_df). - The rest of arguments specify what to do with the data`.
- Output is always another data frame (
tbl_dfordata.frame). - Unless we are assigning (
<-) the result into an object, functions never modify the original data.
Selecting rows: filter
Use filter() to choose rows/cases/observations where a given condition(s) is TRUE.
The key aspect of
filterlies on building the condition that must be met. Doing so involves the use oflogical operators:
Let’s see a few basic examples applied to the trees table:
- Get trees taller than 6 meters:
filter(trees, HeiIf3 > 6)## # A tibble: 100,247 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 080001 08 022 7 8.3 31.8 20 20.3 18.9 9
## 2 080002 08 476 38 9.1 31.8 35 34 32.4 9
## 3 080003 08 021 25 7 31.8 25 24.8 17.6 11
## 4 080004 08 021 28 8.89 31.8 15 16.8 12.6 9.5
## 5 080006 08 021 19 11.2 14.1 35 34.0 30.9 13
## 6 080007 08 021 32 12 14.1 35 33.1 28.2 10
## 7 080009 08 045 16 5.09 31.8 20 17.5 15.3 7
## 8 080010 08 243 47 26.9 5.09 65 67.4 66.8 16.5
## 9 080013 08 022 44 2.7 127. 15 15.1 12.6 9.5
## 10 080014 08 021 13 8.89 31.8 20 19.2 14.2 11.5
## # ... with 100,237 more rows
- Get trees in province 43 (Tarragona):
filter(trees, Provincia == '43')## # A tibble: 9,693 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 430003 43 024 6 9.69 31.8 30 28.1 20.4 10.5
## 2 430006 43 023 24 9.69 31.8 30 28.2 23.2 10.5
## 3 430007 43 024 0 8.8 31.8 30 30.4 27.8 10.5
## 4 430009 43 024 15 9.89 31.8 30 30.4 30.4 11
## 5 430010 43 024 33 12.1 14.1 45 47.2 42.0 13.5
## 6 430011 43 044 1 2.09 127. 10 11.0 9.65 6
## 7 430012 43 024 31 13 14.1 35 33.0 26.5 12.5
## 8 430013 43 024 80 1.89 127. 40 38.4 35.7 11
## 9 430014 43 025 4 4.3 127. 15 14.0 11.2 5
## 10 430015 43 024 22 11.2 14.1 30 31.2 27.4 9
## # ... with 9,683 more rows
- Get trees in diametric classes 45 and 45:
filter(trees, CD %in% c(45, 70))## # A tibble: 2,552 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 080016 08 071 9 26.9 5.09 45 46.0 45 19
## 2 080113 08 021 19 11.1 14.1 70 72.6 70.7 20.9
## 3 080686 08 042 43 7.09 31.8 45 44.3 42.5 12.5
## 4 080721 08 042 16 15.8 5.09 45 46.2 39.8 17.5
## 5 080743 08 042 3 9.69 31.8 45 46.1 40.7 25.5
## 6 081271 08 024 93 10.8 14.1 45 44.0 39.0 17.3
## 7 081278 08 024 87 10.1 14.1 45 44.6 38.9 15.1
## 8 081354 08 026 16 8.6 31.8 45 43.4 30.8 16.9
## 9 081402 08 054 51 5.19 31.8 70 102. 95.5 17.7
## 10 081943 08 024 1 10.3 14.1 45 43.2 33.8 11.7
## # ... with 2,542 more rows
Exercise 1
Let's find those plots in IFN3n (`plots` data frame) that:
1.1 Are located either in Barcelona (08) or Girona (17)
1.2 Were measured **in** January 2001
1.3 It took **more** than 2 hours to measure (7200s)
Selecting columns: select
Select is the function in dplyr to choose variables (i.e., columns) from a tibble or data.frame. It keeps only the variables you mention.
trees## # A tibble: 111,756 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 080001 08 022 7 8.3 31.8 20 20.3 18.9 9
## 2 080002 08 476 38 9.1 31.8 35 34 32.4 9
## 3 080003 08 021 25 7 31.8 25 24.8 17.6 11
## 4 080004 08 021 28 8.89 31.8 15 16.8 12.6 9.5
## 5 080006 08 021 19 11.2 14.1 35 34.0 30.9 13
## 6 080007 08 021 32 12 14.1 35 33.1 28.2 10
## 7 080008 08 243 40 7.8 31.8 15 15 13.2 6
## 8 080009 08 045 16 5.09 31.8 20 17.5 15.3 7
## 9 080010 08 243 47 26.9 5.09 65 67.4 66.8 16.5
## 10 080013 08 022 44 2.7 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows
- Selecting a single column/variable:
select(trees, DiamIf3)## # A tibble: 111,756 x 1
## DiamIf3
## <dbl>
## 1 20.3
## 2 34
## 3 24.8
## 4 16.8
## 5 34.0
## 6 33.1
## 7 15
## 8 17.5
## 9 67.4
## 10 15.1
## # ... with 111,746 more rows
Selecting all columns but the targeted one. The - operator excludes the targeted variable so that we only keep the remaining variables:
select(trees, -Codi)## # A tibble: 111,756 x 9
## Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 08 022 7 8.3 31.8 20 20.3 18.9 9
## 2 08 476 38 9.1 31.8 35 34 32.4 9
## 3 08 021 25 7 31.8 25 24.8 17.6 11
## 4 08 021 28 8.89 31.8 15 16.8 12.6 9.5
## 5 08 021 19 11.2 14.1 35 34.0 30.9 13
## 6 08 021 32 12 14.1 35 33.1 28.2 10
## 7 08 243 40 7.8 31.8 15 15 13.2 6
## 8 08 045 16 5.09 31.8 20 17.5 15.3 7
## 9 08 243 47 26.9 5.09 65 67.4 66.8 16.5
## 10 08 022 44 2.7 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows
- Selecting several columns:
select(trees, DiamIf2, DiamIf3)## # A tibble: 111,756 x 2
## DiamIf2 DiamIf3
## <dbl> <dbl>
## 1 18.9 20.3
## 2 32.4 34
## 3 17.6 24.8
## 4 12.6 16.8
## 5 30.9 34.0
## 6 28.2 33.1
## 7 13.2 15
## 8 15.3 17.5
## 9 66.8 67.4
## 10 12.6 15.1
## # ... with 111,746 more rows
- Selecting a range of columns. The
:operator allows referencing a set of contiguous columns so that all variables contained in that range will be included:
select(trees, Codi:Dist)## # A tibble: 111,756 x 5
## Codi Provincia Especie Rumbo Dist
## <fct> <chr> <fct> <dbl> <dbl>
## 1 080001 08 022 7 8.3
## 2 080002 08 476 38 9.1
## 3 080003 08 021 25 7
## 4 080004 08 021 28 8.89
## 5 080006 08 021 19 11.2
## 6 080007 08 021 32 12
## 7 080008 08 243 40 7.8
## 8 080009 08 045 16 5.09
## 9 080010 08 243 47 26.9
## 10 080013 08 022 44 2.7
## # ... with 111,746 more rows
We can combine
-and:to exclude a range of columns, though we must use avectorto specify the range using parenthesis:
select(trees,-(Codi:Dist))## # A tibble: 111,756 x 5
## N CD DiamIf3 DiamIf2 HeiIf3
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 31.8 20 20.3 18.9 9
## 2 31.8 35 34 32.4 9
## 3 31.8 25 24.8 17.6 11
## 4 31.8 15 16.8 12.6 9.5
## 5 14.1 35 34.0 30.9 13
## 6 14.1 35 33.1 28.2 10
## 7 31.8 15 15 13.2 6
## 8 31.8 20 17.5 15.3 7
## 9 5.09 65 67.4 66.8 16.5
## 10 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows
We can leverage some special functions especifically design to work withwith strings to enhance column extraction:
starts_with(x): names that start with x.ends_with(x): names that end with x.contains(x): selects all variables whose name contains x.matches(x): selects all variables whose name contains the regular expression x.num_range("x", 1:5, width = 2): selects all variables (numerically) from x01 to x05one_of ("x", "y", "z"): selects variables provided in a character vector.everything(): selects all variables.
- Select columns that start with the string
Diam:
select(trees, starts_with('Diam'))## # A tibble: 111,756 x 2
## DiamIf3 DiamIf2
## <dbl> <dbl>
## 1 20.3 18.9
## 2 34 32.4
## 3 24.8 17.6
## 4 16.8 12.6
## 5 34.0 30.9
## 6 33.1 28.2
## 7 15 13.2
## 8 17.5 15.3
## 9 67.4 66.8
## 10 15.1 12.6
## # ... with 111,746 more rows
Exercise 2
Think of three or four ways to select the variables that define the start and finish date of plot measuring.
Sorting rows: arrange
The last function we are going to cover is arrange. This function allows sorting a data table according to the values of a guiding variable. Sorting can be done in either ascending or descending order, both with numerical (increase or decrease) or alphanumerical (alphabetical order) variables.
As we will see in further sections, sorting data may significantly help in the process of data visualization, and can also be an easy way to inspect the range of data of a particular variable.
- Sort trees ascending by height in the third forest inventory:
arrange(trees, HeiIf3)## # A tibble: 111,756 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 171919 17 045 72 4.5 127. 10 10.2 10.4 1.5
## 2 171806 17 046 138 4.5 127. 10 11.9 9.95 1.5
## 3 171049 17 045 331 3.29 127. 10 7.65 7.65 1.5
## 4 080073 08 071 269 3.9 127. 10 10.4 9.2 1.6
## 5 082415 08 045 9 4.69 127. 10 9.7 9.7 1.79
## 6 082451 08 044 33 1 127. 10 7.9 8.1 2
## 7 171959 17 068 17 3.29 127. 10 9.95 8.4 2
## 8 250494 25 042 29 3.09 127. 10 8.45 8.45 2
## 9 251649 25 022 5 5.5 31.8 15 14 14 2
## 10 080664 08 042 49 4.19 127. 10 8.45 8.3 2
## # ... with 111,746 more rows
- Sort them descending:
arrange(trees, desc(Dist))## # A tibble: 111,756 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 250008 25 042 398 36.5 5.09 70 74.8 72.3 11.1
## 2 250025 25 031 350 34.8 5.09 70 75.1 71.3 35.7
## 3 250135 25 022 158 33.9 5.09 50 51.8 54.0 9
## 4 250051 25 021 110 33.6 5.09 60 58.2 56.8 6.5
## 5 250025 25 031 238 33.5 5.09 50 51 45.5 20.3
## 6 250010 25 071 214 32.9 5.09 70 85.3 71.6 21
## 7 251788 25 045 163 32.8 5.09 60 61.6 51.7 10.5
## 8 250039 25 031 119 32.5 5.09 65 63.5 62.6 30
## 9 250025 25 031 390 32.1 5.09 45 45.6 43.8 24.6
## 10 171289 17 071 38 31.4 5.09 60 60.8 50.4 26.9
## # ... with 111,746 more rows
Exercise 3
3.1 Sort plots by date and hour of measurement
3.2 Which plots were started to be measured later in the day?
3.3 Which plots took longer to be measured?
Transforming variables: mutate
mutate() allows to create/calculate new columns/variables in our dataset. It is perhaps one of the most used functions given its ability to either create or update variables in a given tibble or data.frame.
The basic syntax of mutate allows to especify the name of variable (variable_name = data_operation) and then the operation by which we assign information to it.
If the name of the variable already exists, then that variable is replaced and updated with the new values. Otherwise, we will create a new variable. Nonetheless, it would be actually replaced unless we use
<-to store the result into an object.
- Replace
Distwith its old value times 100:
mutate(
trees,
Dist = Dist * 100
)## # A tibble: 111,756 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 080001 08 022 7 830. 31.8 20 20.3 18.9 9
## 2 080002 08 476 38 910 31.8 35 34 32.4 9
## 3 080003 08 021 25 700 31.8 25 24.8 17.6 11
## 4 080004 08 021 28 889 31.8 15 16.8 12.6 9.5
## 5 080006 08 021 19 1119 14.1 35 34.0 30.9 13
## 6 080007 08 021 32 1200 14.1 35 33.1 28.2 10
## 7 080008 08 243 40 780 31.8 15 15 13.2 6
## 8 080009 08 045 16 509 31.8 20 17.5 15.3 7
## 9 080010 08 243 47 2689 5.09 65 67.4 66.8 16.5
## 10 080013 08 022 44 270 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows
Note that several operations can be conducted at once just by separating many operations using
,:
mutate(
trees,
Alometria = DiamIf3 / HeiIf3,
Alometria2 = Alometria * DiamIf2
)## # A tibble: 111,756 x 12
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0800~ 08 022 7 8.3 31.8 20 20.3 18.9 9
## 2 0800~ 08 476 38 9.1 31.8 35 34 32.4 9
## 3 0800~ 08 021 25 7 31.8 25 24.8 17.6 11
## 4 0800~ 08 021 28 8.89 31.8 15 16.8 12.6 9.5
## 5 0800~ 08 021 19 11.2 14.1 35 34.0 30.9 13
## 6 0800~ 08 021 32 12 14.1 35 33.1 28.2 10
## 7 0800~ 08 243 40 7.8 31.8 15 15 13.2 6
## 8 0800~ 08 045 16 5.09 31.8 20 17.5 15.3 7
## 9 0800~ 08 243 47 26.9 5.09 65 67.4 66.8 16.5
## 10 0800~ 08 022 44 2.7 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows, and 2 more variables: Alometria <dbl>,
## # Alometria2 <dbl>
Similar to select we can bring in any special function to mutate. One useful function is the conditional operator if_else. It allows to implement a conditional structure in which we determine the output value when a condition is met and the value when it happens otherwise:
mutate(
trees,
Especie = if_else(Especie == '021', 'Pinus sylvestris', 'Other')
)## # A tibble: 111,756 x 10
## Codi Provincia Especie Rumbo Dist N CD DiamIf3 DiamIf2 HeiIf3
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 080001 08 Other 7 8.3 31.8 20 20.3 18.9 9
## 2 080002 08 Other 38 9.1 31.8 35 34 32.4 9
## 3 080003 08 Pinus sylve~ 25 7 31.8 25 24.8 17.6 11
## 4 080004 08 Pinus sylve~ 28 8.89 31.8 15 16.8 12.6 9.5
## 5 080006 08 Pinus sylve~ 19 11.2 14.1 35 34.0 30.9 13
## 6 080007 08 Pinus sylve~ 32 12 14.1 35 33.1 28.2 10
## 7 080008 08 Other 40 7.8 31.8 15 15 13.2 6
## 8 080009 08 Other 16 5.09 31.8 20 17.5 15.3 7
## 9 080010 08 Other 47 26.9 5.09 65 67.4 66.8 16.5
## 10 080013 08 Other 44 2.7 127. 15 15.1 12.6 9.5
## # ... with 111,746 more rows
Exercise 4
4.1 Get diameter growth (in cm) of each tree between IFN2 (DiamIf2) and IFN3 (DiamIf3)
4.2 Create two new variables with basal area of each tree (in $m^2$ per hectare), both for IFN2 and IFN3. Which is the species of the fastest-growing tree in basal area?
\[ AB = \frac{\pi}{4} · Diam^{2} · N \]